The story line in the NBA since the 2015 Warriors won the championship has been that the game of basketball has become based around high volume of accurate 3 point shots, shorter, faster players, and ball movement (assists). We are seeking to see if this story is true and what other changes in the league may changed.
Without knowing the history of the league our data and our interpretations might seem confusing. The league is regularly changing rules which inevitable impact how players play, the largest of these changes is the addition of the three point line in 1986, before this every shot was only worth 2 points. Another factor which effects our data is player injuries, and playing time. Players can put up amazing stats for half of a season but due to an injury appear as a below average player for that season. FInally, when this data was recorded the 2010-2019 decade had not yet finished so the data only goes up until 2017.
The first 3 questions of this report look into how the type of player in the league has changed, by looking at the age of the player and how that effects their playing, then this report examines how the league has changed in regard to player origin, both by looking at the country they come from and the college they went to. The final () questions of this report look at changes in how the game is played, starting with an examination of changing shot selections, then investigating how rebounding and height have changed in the league, before finally looking at assists and ball movement.
For these questions we wanted to examine how the performance of different age groups has changed throughout time. We did this by plotting box charts for each age group in every decade to see how players from the same age groups preformed across time. We used boxcharts so as to best compare the distribution of performances in by players.
| Number of players under 20 | |
|---|---|
| decade | number of players |
| 1970-1979 | 3 |
| 1980-1989 | 1 |
| 1990-1999 | 11 |
| 2000-2009 | 50 |
| 2010-2019 | 41 |
| Number of players age 20-24 | |
|---|---|
| decade | number of players |
| 1970-1979 | 443 |
| 1980-1989 | 1180 |
| 1990-1999 | 1181 |
| 2000-2009 | 1438 |
| 2010-2019 | 1438 |
| Number of players age 25-29 | |
|---|---|
| decade | number of players |
| 1970-1979 | 585 |
| 1980-1989 | 1585 |
| 1990-1999 | 1820 |
| 2000-2009 | 1814 |
| 2010-2019 | 1634 |
| Number of players age 30-34 | |
|---|---|
| decade | number of players |
| 1970-1979 | 193 |
| 1980-1989 | 586 |
| 1990-1999 | 994 |
| 2000-2009 | 1156 |
| 2010-2019 | 837 |
| Number of players age 35-39 | |
|---|---|
| decade | number of players |
| 1970-1979 | 8 |
| 1980-1989 | 43 |
| 1990-1999 | 179 |
| 2000-2009 | 282 |
| 2010-2019 | 212 |
| Number of players age 40 or older | |
|---|---|
| decade | number of players |
| 1980-1989 | 2 |
| 1990-1999 | 8 |
| 2000-2009 | 8 |
| 2010-2019 | 3 |
Older players today are clearly much worse than older players several decades ago in the points category, especially when looking at younger players who have today have a much more predicable season point production than in previous decades.
When looking at assists the difference is much less pronounced, older players are still able to impact the game through their ability to pass the ball. However, when looking at the data it also looks like passing ability has become a much more universal skill, hence the more squished boxes in the recent decades meaning that players whose main contribution is passing are much less likely to standout about the rest of the league.
Interestingly, older players and younger players both seem to have seen a decline in rebounding ability in this decade, which could be due to other changes in how the game was played making rebounding less valuable to team strategy and/or rebounds being less available.
Every player was sorted into a category based on whether their school was one of the 10 most winning programs in US college basketball. Then these players’ performances were plotted and the regression graphed for each of the group.
One of the major changes the NBA has seen over the years is its development into an international sport. Many countries have their own basketball leagues and the NBA has seen a rise in its own numbers of international players.
| Where players are coming from: By Decade | |||||||
|---|---|---|---|---|---|---|---|
| nationality | 1950-1959 | 1960-1969 | 1970-1979 | 1980-1989 | 1990-1999 | 2000-2009 | 2010-2019 |
| international | 2 | 1 | 7 | 30 | 30 | 50 | 51 |
| US | 159 | 190 | 423 | 572 | 594 | 486 | 445 |
The graphical visualization of this change is interesting because there is clearly a large drop in the number of players in the league in total while the number of international players seems to be slowly increasing.
\(H_0:\ \beta_{\text{elite}} = \beta_{\text{other}}\)
\(H_A:\ \beta_{\text{elite}} \neq \beta_{\text{other}}\)
| decade | se | est | tstat | df | p_value |
|---|---|---|---|---|---|
| 1970-1979 | 0.03298755 | 0.005854540 | 0.1774772 | 748 | 0.8591816 |
| 1980-1989 | 0.02861135 | -0.014726968 | -0.5147247 | 1390 | 0.6068273 |
| 1990-1999 | 0.03468081 | 0.018759172 | 0.5409093 | 1574 | 0.5886466 |
| 2000-2009 | 0.02734783 | 0.005838253 | 0.2134814 | 1582 | 0.8309790 |
| 2010-2019 | 0.02610311 | -0.010278471 | -0.3937642 | 1388 | 0.6938156 |
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
It is not reasonable to combine the Elite programs and the non elite programs to estimate a common slope for a regression line that models Average points per game vs The number of games played in a season. Although at first glance the plots may appear to have similar lines, the hypohtesis test finds a discernible difference in the estimated slopes for the data collected for the two program types. These are both reasonable indicators for a player’s success in a given decade because not only does scoring in an NBA game matter, but the number one asset to a player is their longevity.
TRB is defined as The total number of rebounds grabbed in a season
This analysis here shows two separate graphs, one graphing rebounds vs weight and another graphing rebounds vs rebounds
The Blue points are for players before 2010. The Red points are for players after 2010. The green lines indicate the average for both groups. As the graph shows us, The blue players (players before 2010), have a much higher appearance rate above the green lines than the red players, much more often than the red data points. The green lines in both indicate the average weight and height respectively, while the purple line in both show the average total rebounds between the data for players before 2010 and after.
We back this visual analysis up with a numerical analysis as well:
The heights between the two eras is almost identical, with a difference of only: 0.2210944%
The weights between the two era actually increased, by 4.0151002%
This is an interesting find. Typically, you would expect that if players are getting both taller and heavier (presumably through added muscle mass), then getting rebounds, which is one of the most physical parts of the game, and thus tends to favor larger set players, would also increase. To ensure equal data, it is important to note that both datasets contain roughly the same number of data points (~8500), and thus the averages cannot be skewed simply based off of sheer volume.
This however, is not the trend, as the TRB between the two eras dropped significantly, by -7.372055%. We can see the same trend visually, as the graphs here show.
Before 2010, the red points, appear much more above the average total rebounds weight, indicating that before 2010, heavier set players tended to grab more rebounds. When it comes to height however, the data seems to be much more even.
We further explore the reasoning for why size may not matter as much in rebounding in the new era, and why rebounding as a whole has decreased.
| 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 |
|---|---|---|---|---|---|---|---|---|---|
| 0.4463373 | 0.4537554 | 0.4527946 | 0.4682776 | 0.4722661 | 0.46684 | 0.4781163 | 0.4714633 | 0.4785081 | 0.4839432 |
| 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 |
|---|---|---|---|---|---|---|---|---|---|
| 0.5044521 | 0.4934754 | 0.485293 | 0.4862767 | 0.4811782 | 0.4745481 | 0.4873821 | 0.488513 | 0.4855409 | 0.4796472 |
Combining the analysis of the previous two questions we see an interesting trend:
eFG% is defined as Effective Field Goal Percentage; the formula is (FG + 0.5 * 3P) / FGA. This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal. In other words, a more accurate representation of how often a player scores successfully.
For numerical confirmation, we list the first 10 years of the dataset and the last 10 years and notice the same trend.
For this problem, we analyze the three point percentage for each decade and do the a boxplot on that. The percentage of three pointers here is defined as how many three pointers players have made. From the boxplot above, there is a clear pattern that the percentage of three pointers has been increased steadily over the years.
For this analysis, we look at assists in terms of the number of times the player assisted others. We believe that players who give more assists are part of teams that have more ball movement and thus higher assists overall. We also believe that players who are shooting at a high volume and a high hit rate are more skilled and therefore more like likely to assist others as well.
| 3 Point Hit rate | Average Num of points |
|---|---|
| 0-24% | 109.87582 |
| 25-49% | 165.05613 |
| 50-74% | 65.60571 |
| 75-100% | 50.92308 |
From the graph, players who have a three point percentage between 25%-49% have the highest number of assists than other three point percent categories. This makes sense because the players who fit into 50-74% and 75-100% are likely to play just very few games, which makes these two much lower on their total assists. On the contrary, players who have three point percentage of 0-24% and 25-49% are the most common and therefore, play many games and have higher assists rate.
| 3 Point Hit rate | Average Num of points/Season |
|---|---|
| 0-24% | 529.6972 |
| 25-49% | 657.7919 |
| 50-74% | 360.1257 |
| 75-100% | 354.0513 |
| Highest total number of threes in a single season in NBA history by a Single Player | |||
|---|---|---|---|
| Order | Number of Threes | Season | Player |
| 1 | 402 | 2016 | Stephen Curry |
| 2 | 324 | 2017 | Stephen Curry |
| 3 | 286 | 2015 | Stephen Curry |
| 4 | 276 | 2016 | Klay Thompson |
| 5 | 272 | 2013 | Stephen Curry |
| 6 | 269 | 2006 | Ray Allen |
| 7 | 268 | 2017 | Klay Thompson |
| 8 | 267 | 1996 | Dennis Scott |
| 9 | 262 | 2017 | James Harden |
| 10 | 261 | 2014 | Stephen Curry |
There is a similar pattern for the above boxplot to the previous one which is the number of assists related to three point precentage. Players with a three point hit rate of 25-49% have the highest average and highest total points. Again, the explanation would be similar in that players who have a higher three point hit rate tends to have played fewer games than those who have a relatively lower hit rate. The pattern is pretty consistent across decades.
| Average Win Shares per 48 minutes by position by decade | |||||
|---|---|---|---|---|---|
| decade | C | PF | PG | SF | SG |
| 1970-1979 | 0.09972996 | 0.08990830 | 0.06837850 | 0.08127536 | 0.06493116 |
| 1980-1989 | 0.08343235 | 0.08354468 | 0.07183546 | 0.08007736 | 0.06901453 |
| 1990-1999 | 0.08007875 | 0.08594586 | 0.07800942 | 0.08140076 | 0.07111029 |
| 2000-2009 | 0.08746123 | 0.08995385 | 0.06854902 | 0.07947685 | 0.07353237 |
| 2010-2019 | 0.10818590 | 0.09437515 | 0.06650235 | 0.07210432 | 0.06657292 |
In this graph, we track the relative importance of each position through the decade.We measure this through WS/48 which is defined as an estimate of the number of wins contributed by the player per 48 minutes
The data shows us that the relative importance of each position by decade has fluctuated throughout the years. We do this by graphing the win shares by decade, by position.
The original question of this report was to investigate How has the league changed in the last 50 years, both demographically and in terms of how it is played? We believe the league has changed substantially in terms of composition and style of play.
From question 1 we can see that younger players are better than older players in the three major statistical categories. Even in assists, where younger and older players are comparable, passing skills have become more universal, hence the squished boxes , thus making it harder for older players to stand out purely from passing skill. Rebounding seems to be a dying skill from our data, the total numbers of rebounds is decreasing for every age group but younger players are still better rebounders than older players. The death of rebounding is quite interesting as it is generally considered one of the core skills in basketball. This change to rebounding totals could either be due to changes in how teams strategize or in changes to how the game is played.
Question 2a shows that there is a clear drop in the number of players in the league while there is also a slow increase in the number of international players. So as the number of available roster spots get smaller, and competition for these spots get fiercer, international players are winning them at a steadily growing rate. This could be partly the explanation for why the league has changed because international basketball particularly Euro League basketball has slightly different rules than the NBA and thus is played differently. As players coming from these different backgrounds come into the NBA they are bringing their new playstyles.
Given what we found regarding the decreasing number of available NBA roster positions, it would make sense that only the best domestic players would be able to play. From this we can infer that the difference between performance of players from elite colleges and from non-elite colleges would be the most pronounced in the most recent decades, however, this is not true. We see that actually, elite players in the most recent decade, while still being superior, are less superior today than in previous decades. One possible conclusion that can be drawn from this trend is that the changes in how basketball is played has made the NBA more accessible to players of all skill types, not just players who dominate at the collegiate level. This is validated by an incredibly large p value ~ >=.7, which leads us to reject the null hypothesis that Elite Schools = Other schools. This is also validated through a visual test of the linear models.
One of the major changes that every NBA analyst has been talking about is “three point shooting” . From our data it is clear that shooting in general is increasing, and much of this increase is due to increased numbers of three pointers. The number of 3-pointers is growing in raw numbers and as a proportion of total field goals this is even while each decade surpasses the previous ones in number of total field goals.
In question 4 We find that overall, in metrics like there has been a change. The total rebound percentage for players in the last 10 years has gone down, across every height/weight group, compared to prior to 2010.
At the same, the effective Field Goal percentage by decade has increased substantially in the last 10 years. The above graph indicates that EFG% remained relatively constant right up until about 2010. After that point, the eFG% reached higher levels than ever before, which indicates that players were scoring more often and at higher percentages than ever before… they are becoming more efficient when it comes to scoring overall. This solves the conundrum of the previous question, even though players are relatively the same height and heavier in recent years,the number of rebounds has decreased. The answer: the ball is going in the hoop more often! Less of a need to rebound the ball. On the other hand, three point percentage, and the corresponding assist metrics, have gone up in the last 10 years, indicating that the league is moving towards more shooting, and more efficient shooting.
From question 5, we see the three point percentage over the years has been increasing gradually. This proves our thesis in that the league has been changing over the time in terms of the three point percentage. Also, 25-49% is the most common interval for three point percentage that most players fall into. Players within this interval have the highest total points and number of assists compared to other intervals. This makes sense because players within 25-49% three point percentage tend to play many more games than players in other intervals, which makes them have the highest points and assists in a season.
Finally, we look at how the league has changed in terms of which positions are most valuable. In the 90’s, in the age of players like John Stockton, Michael Jordan, Scottie Pippen, Penny Hardaway, etc, there was a greater distribution of win shares among the 5 basic positions because great players spanned all positions. In the 1970s, in the age of players like Kareem Abdul Jabbar and Julius Erving (“Dr. J”), centers were an incredibly valued position.
A slightly odd trend we notice is the value of Centers between the 2010- 2019 decade. The beginning of this decade saw great centers being used in incredibly efficient roles, such as Dwight Howard, Rudy Gobert, Marc Gasol, and DeMarcus Cousins. However the second half of the season, to the eye test, has been dominated by guards and forwards. Unfortuantely, the dataset only contains data up upntil the 2017 NBA season, and therefore these guards and their win shares are discounted in the data, which we note is a limitation of the dataset.
We digress that the dataset we used is not perfect as it lacks data from after 2017. In addition, even basketball pundits will admit that the advanced stats currently available and widely used are not perfect, as metrics like Player Efficiency Rating doesn’t really value a player’s defensive prowess.
In terms of future questions that we might want to address using this same topic, is to use data that would exist up through the 2020 season is to understand if the data has changed or if any differences have become more pronounced. Specifically, we would also like to explore the effect of a player’s draft position and their long term career outcome, or do a longevity study and track nationally recognzied player’s starting in high school to the end of their careers.
Ultimately what we have found is that the league recently is drastically different than in previosu decades. This is substantiated by several points of analysis 1) The difference in performance between old players and young players is far more pronounced recently than in decades past. 2) There is a larger make up of international players in the league, bringing along their new styles of play. 3) There is a huge difference in the shot selection for players across the league, with the 3 ball becoming far more popular recently than in decades past. 4) Rebounding, a huge part of the game, has decreased in the last 10 years compariatvely to previous decades, while the size of players remains relatively unchanged. This led us to the conclusion that there was less rebounding because people were becoming better shooters, proven again by our analysis with eFG 5) People are becoming a lot better at shooting 3 pointers! We also said that assists are a metric for valid indicator for the ball movement on a team, and that teams that are shooting at better clips are also assisting more. 6) There’s a much larger difference in win shares per position now a days than in years past.